Laboratoire Lorrain de Recherche en Informatique et ses Applications - 2005 Activity Report
نویسنده
چکیده
categorial grammars are not intended as yet another grammatical formalism that would competewith other established formalisms. It should rather be seen as the kernel of a grammatical framework in whichother existing grammatical models may be encoded.4.1.2. Interaction GrammarsInteraction Grammars (IGs) are a linguistic formalism that aims at modelling both the syntax and thesemantics of natural languages according to the following principles: • An IG is a monotonic system of constraints, as opposed to a derivational/transformational system,and this system is multidimensional: at the syntactic level, basic objects are tree descriptions and atthe semantic level, basic objects are Directed Acyclic Graph descriptions.• The synchronization between the syntactic and the semantic levels is realized in a flexible way by apartial function that maps syntactic nodes to semantic nodes.• Much in the spirit of Categorial Grammars, the resource sensitivity of natural language is built-inin the formalism: syntactic composition is driven by an operation of cancellation between polarizedmorpho-syntactic features and in parallel, semantic composition is driven by a similar operation ofcancellation between polarized semantic features. The formalism of IG stems from a reformulation of proof nets of Intuitionistic Linear Logic (which havevery specific properties) in a model-theoretical framework [39] and it was at first designed for modelling thesyntax of natural languages [40].4.1.3. Grammatical and lexical resources for FrenchThe relevance of new linguistic formalisms needs to be proved by experiments on real corpora. Parsing realcorpora requires large scale grammars and lexicons. There is a crucial lack of such resources for French andall researchers committed in NLP projects for French based on different formalisms are confronted with thesame problem. Now, building large scale grammars and lexicons for French demands a lot of time and humanresources and it is crucial to overcome the multiplicity of existing formalisms by developing common andreusable tools and data. This is the sense of two directions of research: 1. The modular organization of formal grammars in a hierarchy of classes allows the expression oflinguistic generalizations and it makes possible their development and their maintenance on a largescale. To be used in NLP applications such modular grammars have to be compiled into operationalgrammars. By comparison with the area of programming languages, we write source grammars in alanguage with a high abstraction level and then we compile them automatically to object grammars,directly usable by NLP applications.Considering the multiplicity of linguistic formalisms, it would be interesting to express the varioussource grammars that can written in different formalisms, in a common abstract language and tocompile them with the same tool associated to this language. XMG [21] is a first experiment in thisdirection: for the moment, it allows the edition and the compilation of source grammars for TAGsand IGs. Moreover, we can hope that the use of a common language of syntactic description with ahigh level of abstraction make easier the reusability of some parts of grammars from one formalismto another.84 Project-Team Calligramme7 2. With the same preoccupation of reusability, it is important to develop syntactic and semantic lexiconswhich contain only purely linguistic information and which are independent of the different existinggrammatical formalisms. Now, a mechanism must be foreseen to combine these lexicons with thegrammars built in the various formalisms. A convenient way of doing this is to design the entriesof such lexicons in the form of feature structures and to associate also feature structures with theelementary constructions of the grammars. Then, their anchoring in the lexicons is realized byunification of the two kinds of feature structures. The construction of a syntactic and a semanticlexicon for French can be envisaged either by acquisition from corpora or be re-use of existinglexical information. 4.2. Termination and complexity of programsThe theory of implicit complexity is quite new and there are still many things to do. So, it is reallyimportant to translate current theoretical tools into real applications; this should allow to validate and guideour hypotheses. In order to do so, three directions are being explored. 1. First order functional programming. A first prototype, called ICAR has been developed and shouldbe integrated into ELAN (http://elan.loria.fr).2. Extracting programs from proofs. Here, one should build logical theories in which programsextracted via the Curry-Howard isomorphism are efficient.3. Application to mobile code system. This work starts in collaboration with the INRIA Cristal andMimosa project-teams. 5. Software5.1. Leopar5.1.1. Description of the softwareLEOPAR is a parser for natural languages which is based on the formalism of Interaction Grammars(IG) [40]. It uses a parsing principle, called “electrostatic parsing” which is based on neutralizing oppositepolarities. A positive polarity corresponds to an available linguistic feature and a negative one to an expectedfeature.Parsing a sentence with an Interaction Grammar (IG) consists in first selecting a lexical entry for eachof its words. A lexical entry is an underspecified syntactic tree, a tree description in other words. Then, allselected tree descriptions are combined by partial superposition guided by the aim of neutralizing polarities:two opposite polarities are neutralized by merging their support nodes. Parsing succeeds if the process endswith a minimal and neutral tree. As IG are based on polarities and under-specified trees, LEOPAR uses somespecific and non-trivial data-structures and algorithms.The electrostatic principle has been intensively considered in LEOPAR. The theoretical problem of parsingIGs is NP-complete; the nondeterminism usually associated to NP-completeness is present at two levels: whena description for each word is selected from the lexicon, and when a choice of what nodes to merge is made.Polarities have shown their efficiency in pruning the search tree for these two steps: • In the first step (tagging the words of the sentence with tree descriptions), we forget the structureof descriptions, and only keep the bag of their features. In this case, parsing inside the formalism isgreatly simplified because composition rules reduce to the neutralization of a negative feature-valuepair f ← v by a dual positive feature-value pair f → v. As a consequence, parsing reduces toa counting of positive and negative polarities present in the selected tagging for every pair (f, v):every positive occurrence counts for +1 and every negative occurrence for –1, the sum must be 0.• In the second step (node-merging phase), polarities are used to cut off parsing branches whose treescontain too many uncancelled polarities.85 8Activity Report LORIA 2005 5.1.2. Current state of the implementationA first prototype has been developed until 2003 by Guillaume Bonfante, Bruno Guillaume. This implemen-tation has many drawbacks and is not maintained.Since 2004, a new implementation of LEOPAR started. Guillaume Bonfante, Bruno Guillaume, Guy Perrierand Sylvain Pogodalla work on this new implementation. The current implementation (17,000 lines of Ocaml)provides different running modes: • automatic parsing of a sentence or a set of sentences;• manual parsing (the user chooses the couple of nodes to merge);• visualization of grammars produced by XMG or of set of description trees associated to some Frenchword. The main improvements with respect to the previous implementation are: • a finer data structure for tree description: there are now two notions of precedence (direct and large)and there is arity constraint on nodes;• a new algorithm for the first step (tagging) which uses deterministic automata and provides a finercontrol on the way the filters are applied;• a new algorithm for the node-merging phase: more constraint propagations are used (hence the searchspace is reduced);• grammars created with XMG are now directly usable in LEOPAR;• a new graphical interface (using GTK) which is useful for debugging of grammar. The current implementation is available on the web (http://www.loria.fr/equipes/calligramme/leopar/) underthe CECILL License (http://www.cecill.info).The current implementation comes with a middle-size coverage grammar for French (710 tree descriptionsin the grammar produced with XMG). It includes also morphological and syntactical lexicons that cover theFrench examples of the TSNLP (Test Suite for Natural Langage Processing) [38]. 5.2. XMGThe eXtensible MetaGrammar [21] (XMG) is a tool for generating large coverage grammars from concisedescriptions of linguistic phenomenena (the so-called metagrammar). This software is a Calligramme andLangue Et Dialogue joint work and was formerly known as The Metagrammar Workbench.This software is based on 2 important concepts from logic programming, namely the Warren’s AbstractMachine and constraints on finite set. It has been developed by Benoît Crabbé, Yannick Parmentier, DenysDuchier and Joseph Le Roux. The first release is available at http://sourcesup.cru.fr/xmg. It is now maintainedby PhD students Yannick Parmentier and Joseph Le Roux.At current stage of implementation, XMG generates Tree adjoining grammars and Interaction grammars butthe underlying formalism is generic so it could be extended to others grammars like dependency grammars orlexical functional grammars, depending on users’ requests.XMG has been used in order to design realistic grammars for French, that is to say grammars coveringcommon linguistic knowledge and phenomena. Guy Perrier wrote an Interaction Grammar that is availablewith LEOPAR. Benoît Crabbé wrote a Tree Adjoining Grammar inspired by the well known FTAG evaluatedin less than 3 months. Claire Gardent is using XMG to design a tree adjoining grammar with semantics. JosephLe Roux is also designing an Interaction Grammar of coordination with XMG.XMG also has users outside the LORIA: Owen Rambow (Colombia University) is implementing a grammarfor Arab, designed with XMG and PhD students from Penn University also work with this tool.6 Project-Team Calligramme9 5.3. Linguistic Resource DevelopmentIn order to get actual lexicons to run LEOPAR, we needed to develop some lexical resources. The generalarchitecture is the following: 1. Lexicon resources are described in two different databases: one for morphological informations andthe other one for syntactical aspects; the two databases are compiled in a morpho-syntactical lexiconthat combines the two kinds of information. In this compiled lexicon, feature structures are used torepresent morpho-syntactical features associated to each flexed form.2. From a metagrammar, through XMG (see 5.2), we generate anonymous tree descriptions that canoccur in the targeted language (French); each tree description comes with a feature structure (calledinterface) that describes how this tree should be anchored in the lexicon database.3. Finally, we use feature structure unification to combine grammatical and morpho-syntacticdatabases. When unification between the feature structure of a word (given by the morpho-syntacticdatabase) and the interface of a tree description succeeds, the word anchored the corresponding treedescription which is now fully instantiated. To this end, in addition to the tools to merge the different kinds of lexicons, Bruno Guillaume and SylvainPogodalla have developed a tool1 that can produce LEOPAR formatted morphological lexicons from externalmorphological lexicons (as for now, from our own verb descriptions2 and from the morphological lexiconMorphalou3 provided by the ATILF4).This tool is also used in the concordancers we provide5 (based on the Test Suites for Natural LanguageProcessing (TSNLP6) and on Le tour du Monde en 80 jours7(J. Verne). These concordancers are used in ourproject-team and in the Langue et Dialogue INRIA project-team to help grammar writers.Two students at École des Mines (Damien Auricchio and Nelson Da Silva) also worked on factorizingmorphological informations of flexed forms and on comparison of UNITEX8 morphological lexicon and ourown verb lexicon during a training period of three months. 5.4. ACG related softwareA development environment for ACGs is being developed by Bruno Guillaume, Philippe de Groote andSylvain Pogodalla. The main features are the abilities to read signatures and lexicons and to realize objectterms from abstract ones. This new version integrate the ability to use features in types. Parsing (to buildabstract terms from object terms) and example grammars are being developed.
منابع مشابه
Formal islands: foundations and applications
Formal islands: foundations and applications IN COLLABORATION WITH: Laboratoire lorrain de recherche en informatique et ses applications (LORIA)
متن کاملModelling the Growth of Blood Vessels in Health and Disease
The PractiKPharma consortium comprises four academic partners: two computer science laboratories, the LORIA (Laboratoire Lorrain de Recherche en Informatique et ses Applications, Nancy, France) in Nancy, the LIRMM (Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier, France) in Montpellier; and two University Hospitals, the HEGP (The Georges Pompidou European Hospita...
متن کاملAnnotation for knowledge sharing in a collaborative environment
Document Information: Title: Annotation for knowledge sharing in a collaborative environment Author(s): Charles Abiodun Robert, (Based at Laboratoire Lorrain de recherche en informatique et ses applications, LORIA, Vandoeuvre-lès-Nancy, France.) Citation: Charles Abiodun Robert, (2009) "Annotation for knowledge sharing in a collaborative environment", Journal of Knowledge Management, Vol. 13 Is...
متن کاملRole of Force-cues in Path Following of 3D Trajectories in Virtual Reality
1Grenoble University, France 2Centre National de la Recherche Scientifique (CNRS), Laboratoire de psychologie et NeuroCognition (LPNC), Grenoble, France 3CNRS, Laboratoire des Techniques de l’Ingénierie Médicale et de la Complexité-Informatique, Mathématiques et Applications de Grenoble (TIMC-IMAG), Grenoble, France, 4i3D Institut National de la Recherche en Informatique et Automatique (INRIA) ...
متن کاملStratégies de bandit pour les systèmes de recommandation. (Bandit strategies for recommender systems)
Jury : Mme Josiane MOTHE – Institut de Recherche en Informatique de Toulouse (IRIT), Co-Directeur de thèse M. Aurélien GARIVIER – Institut de Mathématiques de Toulouse (IMT), Co-Directeur de thèse M. Max CHEVALIER – Institut de Recherche en Informatique de Toulouse (IRIT), Co-Directeur de thèse M. Olivier CAPPÉ – Laboratoire Traitement et Communication de l'Information (LTCI), CNRS, Télécom Par...
متن کاملVIDA. Une thématique art-science dans un laboratoire de recherche scientifique
Dès ses débuts, l’informatique a été un champ d’application privilégié pour faire se croiser les questionnements artistiques et scientifiques et nourrir une recherche créative au fur et à mesure des avancées technologiques. Néanmoins, le développement de projets art-science au sein d’un laboratoire scientifique demeure une démarche singulière, difficile à valoriser institutionnellement même si ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005